Dimension Reduction in Regression Estimation with Nearest Neighbor

نویسندگان

  • Benoı̂t CADRE
  • Qian DONG
چکیده

In regression with a high-dimensional predictor vector, dimension reduction methods aim at replacing the predictor by a lower dimensional version without loss of information on the regression. In this context, the so-called central mean subspace is the key of dimension reduction. The last two decades have seen the emergence of many methods to estimate the central mean subspace. In this paper, we go one step further, and we study the performances of a k-nearest neighbor type estimate of the regression function, based on an estimator of the central mean subspace. The estimate is first proved to be consistent. Improvement due to the dimension reduction step is then observed in term of its rate of convergence. All the results are distributionsfree. As an application, we give an explicit rate of convergence using the SIR method. Index Terms — Dimension Reduction; Central Mean Subspace; Nearest Neighbor Method; Semiparametric Regression; SIR Method. AMS 2000 Classification — 62H12; 62G08.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discriminant Adaptive Nearest Neighbor Classification and Regression

Robert Tibshirani Department of Statistics University of Toronto tibs@utstat .toronto.edu Nearest neighbor classification expects the class conditional probabilities to be locally constant, and suffers from bias in high dimensions We propose a locally adaptive form of nearest neighbor classification to try to finesse this curse of dimensionality. We use a local linear discriminant analysis to e...

متن کامل

Software Cost Estimation by a New Hybrid Model of Particle Swarm Optimization and K-Nearest Neighbor Algorithms

A successful software should be finalized with determined and predetermined cost and time. Software is a production which its approximate cost is expert workforce and professionals. The most important and approximate software cost estimation (SCE) is related to the trained workforce. Creative nature of software projects and its abstract nature make extremely cost and time of projects difficult ...

متن کامل

Boosting the distance estimation: Application to the K-Nearest Neighbor Classifier

In this work we introduce a new distance estimation technique by boosting and we apply it to the K-Nearest Neighbor Classifier (KNN). Instead of applying AdaBoost to a typical classification problem, we use it for learning a distance function and the resulting distance is used into K-NN. The proposed method (Boosted Distance with Nearest Neighbor) outperforms the AdaBoost classifier when the tr...

متن کامل

Modelling Climatic Parameters Affecting the Annual Yield of Rheum Ribes Rangeland Species using Data Mining Algorithms

Identification of climatic characteristics affecting the annual yield of Rheum Ribes can be useful in management and development of this species in the rangelands. In this research, the annual yield of this species in Khorasan-Razavi province based on 74 climatic parameters during a ten-year period evaluated and affecting climatic parameters extracted using data mining methods. First, the role ...

متن کامل

Asymptotic Behaviors of Nearest Neighbor Kernel Density Estimator in Left-truncated Data

Kernel density estimators are the basic tools for density estimation in non-parametric statistics.  The k-nearest neighbor kernel estimators represent a special form of kernel density estimators, in  which  the  bandwidth  is varied depending on the location of the sample points. In this paper‎, we  initially introduce the k-nearest neighbor kernel density estimator in the random left-truncatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009